The Corpus for Idiolectal Research (CIDRE)

نویسندگان

چکیده

The Corpus for Idiolectal Research (CIDRE) is a collection of fiction works from 11 prolific 19th-century French authors (4 women, 7 men; 22–62 works/author; total 37 million words). Every work dated with the year it was written. Using programming scripts, have been gathered open source platforms, example La Bibliothèque électronique du Québec, and stripped paratext (text not being part novel, e.g. prefaces). We distribute text files, dating, other metadata scripts under an license. CIDRE first resource study style idiolect in diachronic manner (i.e. stylochronometry) on larger scale.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Slavonic Corpus for Stylometry Research

Stylometry techniques such as authorship recognition, machine translation detection and pedophile identification are daily used in applications for the most widely used languages. But under-represented languages lack data sources usable for stylometry research. In this paper, we propose an algorithm to build corpora containing meta-information required for stylometry experiments (author informa...

متن کامل

the search for the self in becketts theatre: waiting for godot and endgame

this thesis is based upon the works of samuel beckett. one of the greatest writers of contemporary literature. here, i have tried to focus on one of the main themes in becketts works: the search for the real "me" or the real self, which is not only a problem to be solved for beckett man but also for each of us. i have tried to show becketts techniques in approaching this unattainable goal, base...

15 صفحه اول

Some Experiments on Idiolectal Differences among Speakers

It is generally recognized that human listeners can distinguish between speakers who are familiar to them far better than those who are unfamiliar. This increased ability is due no doubt to speaker idiosyncrasies that are recognized by the listener, either consciously or unconsciously. These speaker characteristics offer the possibility to significantly improve automatic speaker recognition per...

متن کامل

Cidre: programming with distributed shared arrays

A programming model that is widely approved today for large applications is parallel programming with shared variables. We propose an implementation of shared arrays on distributed memory architectures: it provides the user with an uniform addressing scheme while being e cient thanks to a logical paging technique and optimized communication mechanisms.

متن کامل

A speech corpus for multitalker communications research.

Several recent experiments at the Air Force Research Laboratory have investigated the utility of spatial audio displays for augmenting speech intelligibility in multitalker communications environments ~Bolia et al., 1999; Nelson et al., 1998a; Nelson et al., 1998b; Simpson et al., 1999!. Some of the goals of this research included: ~1! an empirical determination of the maximal number of channel...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of open humanities data

سال: 2021

ISSN: ['2059-481X']

DOI: https://doi.org/10.5334/johd.42